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Abstract 

Thalassospira bacteria are widespread and have been isolated from various marine environments. Less is known about their 
genetic diversity and biogeography, as well as their role in marine environments, many of them cannot be discriminated 
merely using the 16S rRNA gene. To address these issues, in this report, the phylogenetic analysis of 58 strains from 
seawater and deep sea sediments were carried out using the multilocus sequence analysis (IVILSA) based on acsA, aroE, gyrB, 
mutL, rpoD and trpB genes, and the DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) based on genome 
sequences. The MLSA analysis demonstrated that the 58 strains were clearly separated into 15 lineages, corresponding to 
seven validly described species and eight potential novel species. The DDH and ANI values further confirmed the validity of 
the IVILSA analysis and eight potential novel species. The IVILSA interspecies gap of the genus Thalassospira was determined 
to be 96.16-97.12% sequence identity on the basis of the combined analyses of the DDH and IVILSA, while the ANIm 
interspecies gap was 95.76-97.20% based on the in silico DDH analysis. Meanwhile, phylogenetic analyses showed that the 
Thalassospira bacteria exhibited distribution pattern to a certain degree according to geographic regions. Moreover, they 
clustered together according to the habitats depth. For short, the phylogenetic analyses and biogeography of the 
Thalassospira bacteria were systematically investigated for the first time. These results will be helpful to explore further their 
ecological role and adaptive evolution in marine environments. 
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Introduction 

Thalassospira are a genus consisting of Gram-negative, motile, 
vibrio- or spiral-sliaped, halotolerant and chemoheterotrophic 
bacteria belonging to the family Rhodospirillaceae within the class 
Alphaproteobacteria and was created by Lopez-Lopez et al in 2002 
[1]. In the following years, more and more isolates of 
Thalassospira were obtained from various marine environments 
and identified using the polyphasic taxonomy. Currently, the 
Thalassospira genus has accommodated eight species: T. lucen- 
tensis [1], T. profundimaris [2], T. xiamenensis [2], T. tepidiphila 
[3], T. xianhensis [4], T. alkalitolerans and T. mesophila [5], T. 
povalilytica [6], except a non-vahdly published species named T. 
permensis [7]. 

We found that bacteria of Thalassospira were widespread in the 
marine environments and occupied a variety of ecological niches, 
such as surface and deep seawater, deep sediment, halobios etc. 



covering the Pacific Ocean, the Atlantic Ocean, the Indian Ocean 
and even the Arctic Ocean [8—11]. They aroused extensive 
attention because of their potential in eliminating marine oil 
pollution, especially in polycyclic aromatic hydrocarbons (PAHs) 
degradation [3,4,7,12]. Recently, Thalassospira were found to 
closely related with the phosphorus cycling in marine environ- 
ments, especially in the oligotrophic open ocean environments 
[13,14]. Additionally, some strains of this genus can produce 
thalassospiramides [15,16], beta-galactosidase [17] and biosurfac- 
tants [18]. Accumulation of these bacteria with different functions 
and originations urges analysis on their diversity, evolution and 
geographic distribution. 

In recent years, a large number of bacteria within the genus 
Thalassospira were isolated from various marine samples enriched 
with different hydrocarbons, including hexadecane, BTEX (ben- 
zene, toluene, ethylbenzene and xylene), and crude oil in our 
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laboratory. Nevertheless, most of the Thalassospira bacteria 
shared relatively high identities to the 16S rRNA gene (>98%) 
with each other, with only one exception of T. lucentensis DSM 
14000^, thus they could not be classified based upon the 16S 
rRNA gene. Therefore, an efiFective method is imperative to 
discriminate these closely related strains. 

Since 2005, MLSA has rapidly become a powerful tool and has 
been performed frequently for the taxonomy and phylogenetic 
analysis of the closely related strains [19-21]. In general, the 
several housekeeping genes are widely applied in MLSA schemes, 
such as acsA, gyrB, mutL, rpoD, etc [22-25], because they encode 
core metabolic enzymes and own by all members of the organisms. 
Furthermore, the concatenated housekeeping genes could over- 
come conflicting signals from horizontal gene transfer and 
recombination. In addition, DNA fingerprinting [26,27], DNA- 
DNA hybridization (DDH) [21,23] and average nucleotide 
identity (ANI) [28,29] are dependable methods for identification 
of the closely related strains. 

In this study, we employed MLSA using six housekeeping genes, 
acsA, aroE, gyrB, mulL, rpoD and trpB, combined with the 16S 
rRNA gene, DDH and ANI values from draft genome sequences, 
to explore the diversity, and geographic distribution of the 
Thalassospira bacteria originating from diverse marine environ- 
ments. 

Materials and Methods 

Ethics Statement 

The bacterial strains in this study were isolated from the various 
areas beyond the international sea area or the exclusive economic 
zone of China. The type strains were obtained from the public 
culture collections. So no specific permissions were required for 
these collections. Moreover, the sample did not involve endan- 
gered or protected species. 

Bacterial strains 

A total of 58 Thalassospira strains were analyzed in this study. 
Specifically, 52 Thalassospira strains were isolated in our 
laboratory and deposited at the Marine Culture Collection of 
China (MCCC), including two type strains, T. xiamenensis M-b^ 
and T. profundimaris WPO0211^. T. lepidiphila 1-lB^ was 
generous gift from Dr. Kazuya Watanabe [3]; T. lucentensis DSM 
14000'^, T. xianhensis CGMCC 1.6849'^, T. alkalitolerans JCM 
18968'^, T. mesophila JCM 18969'^ and T. permensis NBRC 
106175"^ (non-valid) were purchased from DSMZ, CGMCC, JCM 
and NBRC, respectively. The detail information of all strains was 
presented in Table 1. They were isolated from 32 different 
locations across the Pacific Ocean, the Atiantic Ocean, the Indian 
Ocean, the Yellow Sea, the East China Sea and the Mediterra- 
nean Sea. The sampling sites were shown in Figure SI. At the time 
of this study, type strain T. povalilytica Zumi 95^ was not reported 
and therefore not included in our study. 

Cultivation and DNA preparation 

AU strains were incubated on 216 L agar medium 
(CH^COONa 1.0 g, tryptone 10.0 g, yeast extract 2.0 g, sodium 
citrate 0.5 g, NH4NO3 0.2 g, agar 15 g, 1 L sea water, pH 7.5) 
[30] at 28°C for 48 h. Genomic DNA was extracted using the SBS 
extraction kit (SBS Genetech Co., Ltd. in Shanghai, China) 
following the manufacturer's instructions. 



PCR amplification and sequencing of 16S rRNA and six 

housekeeping genes 

The 16S rRNA gene was amplified using universal primers 27F 
and 1492R, while six housekeeping genes were amplified, 
respectively, with specific primers designed in Kght of the genome 
sequences of the six type strains using the Primer Premier 5.0 
(Table SI). These genes were amplified under nearly identical 
conditions. The 50 |tL reaction mixtures contained 37 |tL of 
double distilled water, 5 nL of lOxEx Taq buffer (Mg^^ Plus), 
4 nL of dNTP mixture (10 mM), 1 nL of each primer (20 jtM), 
1 \\L of genomic DNA template (10-30 ng/(j,L), 1 jtL of Ex 
Taq^^ DNA polymerase (TaKaRa, 5 U/fiL). The reaction 
mixture was subjected to the following parameters in a Biometra 
T-Professional thermocycler (Biometra; Goettingen, Germany): 
initial denaturation at 94°C for 5 min; then 30 cycles of 30 s of 
denaturation at 94°C, 30 s of annealing at 55°G, and 1.0-1.5 min 
of extension at 72°C; and a final extension at 72°C for 10 min. 
More detail information was listed in Table SI. After amplifica- 
tion, these PCR amplicons were separated by electrophoresis on a 
1 % agarose gel and then purified using the PCR purification kit 
(Axygen Scientific, Inc., USA) according to the manufacturer's 
instructions. Finally, the purified amplicons were sequenced with 
the ABI3730xl platform (Shanghai Majorbio Bio- Pharm Tech- 
nology Co., Ltd., China) using the corresponding sequencing 
primers (Table SI). 

The sequences of the 16S rRNA gene and six housekeeping 
genes were assembled and modified using DNAMAN version 5.0. 
AU sequences were submitted to the GenBank database. The 
accession numbers were assigned and hsted in Table S2. 

Analysis of nucleotide diversity 

The determined sequences of the 16S rRNA gene and six 
housekeeping genes were compared to the National Center of 
Biotechnology Information (NCBI) database using BLAST. For 
each housekeeping gene, different alleles were assigned succes- 
sively different numbers, and a unique combination of 6 allele 
numbers (allehc profile) unambiguously was defined the sequence 
type (ST) of a bacterium. When several strains shared the same 
allelic profiles, they unquestionably possessed the same ST. A total 
of allelic profiles were utilized for subsequent analysis. Meantime, 
the genetic distances and serjuence similarities of the single gene 
were calculated using Kimura's 2-parameter model [31] with the 
MEGA version 5.0 [32]. The numbers of polymorphic sites and 
the mean G+C (guanine-cytosine) content in each gene were 
obtained by the MEGA version 5.0. The nucleotide diversity (it) 
per site for each gene was performed using the data analysis in 
molecular biology and evolution (DAMBE) version 5.0 [33]. In 
addition, the selective pressure on six housekeeping genes were 
evaluated with the ratios of Ka./Ks (ifa: the number of non- 
synonymous substitutions per non-synonymous site, A^s: the 
number of synonymous substitutions per synonymous site) using 
the MEGA version 5.0. 

Population genetic analyses 

The assessment of substitution saturation for each gene was 
determined using the DAMBE version 5.0. In order to measure 
the possibility of reticulate evolution, split decomposition analysis 
was performed for all single gene and the concatenated genes 
using the neighbor-net algorithm in SplitsTree version 4.0 by 
default settings [34]. The possible existence of the recombination 
in all genes was estimated using the pairwise homoplasy index 
(PHI) test as implemented in SplitsTree version 4.0. Meantime, the 
level of Linkage disequilibrium, as measured by the standardized 
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Table 1. The 58 Thalassospira bacteria in this study. 





MCCC NO. 


Original No. 


Closest type strdin of 16S 
rRNA gene (Identity %) 


Geographic origin 


Source 


Depth (m) 


1 A00207 


wpo2ir 


T. profundimaris 


The Pacific Ocean 


Sediment 


4,480 


1A00209 




T. xiamenensis 


The Taiwan strait 


Superficial seawater 


0 


1 A00350 


R8-17 


T. profundimaris (99.7) 


The Indian Ocean 


Deep seawater 


668 


1A00370 


R8-8 


T. profundimaris (99.4) 


The Indian Ocean 


Deep seawater 


668 


1A00383 


QMT2^ 


T. lucentensis 


The Mediterranean Sea 


Upper seawater 


100 


1 A00385 


R4-5 


T. profundimaris (99.4) 


The Indian Ocean 


Deep seawater 


2,268 


1 A00624 


SMB34'^ 


T. permensis 


Berezniki, Perm region, Russia 


Soil 


0 


1A00753 


MBE#61^ 


T. alkalitolerans 


The coastal area, Japan 


Sunken bamboo 


0 


1 A00756 


MBE#74'^ 


T. mesophila 


The coastal area, Japan 


Sunken bamboo 


0 


1A01013 


W3-1 


T. xiamenensis (99.4) 


The Pacific Ocean 


Sediment 


2,682 


1A01017 


DBT-2 


T. xiamenensis (99.4) 


The Pacific Ocean 


Sediment 


2,682 


1A01041 


PTG4-18 


T. xiamenensis (99.5) 


The Indian Ocean 


Sediment 


2,946 


1A01051 


MARC2P1 


T. xiamenensis (99.5) 


The Atlantic Ocean 


Sediment 


3,542 


1A01057 


MARC4CW 


T. profundimaris (99.7) 


The Atlantic Ocean 


Sediment 


3,962 


1A01072 


MARC2C0D 


T. xiamenensis (99.5) 


The Atlantic Ocean 


Sediment 


3,542 


1A01103 


PB8B 


T. profundimaris (99.6) 


The Indian Ocean 


Deep seawater 


1,800 


1A01109 


PB9B 


T. profundimaris (99.4) 


The Indian Ocean 


Deep seawater 


1,800 


1A01140 


MARMC3G 


T. xiamenensis (99.5) 


The Atlantic Ocean 


Sediment 


4,046 


1A01148 


MARC2C07 


T. profundimaris (99.7) 


The Atlantic Ocean 


Sediment 


3,542 


1A01166 


35 


T. profundimaris (99.4) 


The Indian Ocean 


Sediment 


2851 


1A01167 


78 


T. profundimaris (99.4) 


The Indian Ocean 


Sediment 


2,851 


1A01172 


MARC2PPNC 


T. profundimaris (99.4) 


The Atlantic Ocean 


Sediment 


3,542 


1A01275 


MC2-14 


T. profundimaris (99.7) 


The Atlantic Ocean 


Deep seawate 


3,542 


1A01288 


S3 1-2-1 


T. profundimaris (98.3) 


The Indian Ocean 


Superficial seawater 


0 


1A01300 


S27-11 


T. xiamenensis (99.5) 


The Indian Ocean 


Superficial seawater 


0 


1A01318 


S25-3-2 


T. profundimaris (98.3) 


The Indian Ocean 


Superficial seawater 


0 


1A01330 


S29-3-A 


T. xiamenensis (99.5) 


The Indian Ocean 


Superficial seawater 


0 


1A01423 


S25-4 


T. profundimaris (98.3) 


The Indian Ocean 


Superficial seawater 


0 


1 AO 1448 


S3 1-7 


T. xiamenensis (99.5) 


The Indian Ocean 


Superficial seawater 


0 


1 AO 1449 


S3 1-6 


7. profundimaris (98.3) 


The Indian Ocean 


Superficial seawater 


0 


1A02030 


PR54-5 


T. profundimaris (99.5) 


The Indian Ocean 


Deep seawater 


4,146 


1A02031 


2CR55-14 


7. profundimaris (99.5) 


The Indian Ocean 


Deep seawater 


3,946 


1A02039 


PR57-5 


7. profundimaris (99.7) 


The Indian Ocean 


Deep seawater 


3,546 


1 A02040 


PR57-2 


7. profundimaris (99.7) 


The Indian Ocean 


Deep seawater 


3,546 


1A02041 


2CR55-15 


7. profundimaris (99.6) 


The Indian Ocean 


Deep seawater 


3,946 


1 A02042 


2CR-54-5 


7. profundimaris (99.6) 


The Indian Ocean 


Deep seawater 


4,146 


1A02059 


NIC1013S-2 


7. profundimaris (99.4) 


The Indian Ocean 


Deep seawater 


2,455 


1A02060 


RC911-4 


7. profundimaris (99.4) 


The Indian Ocean 


Deep seawater 


668 


1 A02093 


PC99-15 


7. profundimaris (99.7) 


The Indian Ocean 


Deep seawater 


1,068 


1 A02094 


MC2-9 


7. xiamenensis (99.5) 


The Atlantic Ocean 


Overlaid seawater 


3,542 


1A02096 


PC92-18 


7. profundimaris (99.7) 


The Indian Ocean 


Deep seawater 


2,468 


1A02616 


P.4T 


7. xianhensis 


Xianhe town, China 


Soil 


0 


1A02753 


iB2 


T. xiamenensis (99.4) 


Yellow Sea, China 


Upper seawater 


10 


1A02758 


IB13 


7. xiamenensis (99.2) 


Yellow Sea, China 


Upper seawater 


10 


1 A02767 


ID7 


7. xiamenensis (99.2) 


Yellow Sea, China 


Upper seawater 


5 


1A02785 


IHl 


7. xiamenensis (99.2) 


Yellow Sea, China 


Upper seawater 


5 


1 A02803 


IKl 


7. profundimaris (100) 


Yellow Sea, China 


Upper seawater 


40 


1 A02843 


IP8 


7. xiamenensis (99.2) 


Yellow Sea, China 


Upper seawater 


30 


1 A02866 


IU14 


7. xiamenensis (99.2) 


Yellow Sea, China 


Upper seawater 


30 
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Table 1. Cont. 



MCCC NO. 


Original No. 


Closest type strain of 16S 
rRNA gene (identity %) 


Geographic origin 


Source 


Depth (m) 


1A02873 


IV17 


7. xiamenensis (99.2) 


Yellow Sea, China 


Upper seawater 


40 


1A02878 


1X2 


T. xiamenensis (99.4) 


Yellow Sea, China 


Upper seawater 


50 


1 A02898 


JB7 


T. profundimaris (100) 


Yellow Sea, China 


Upper seawater 


2.5 


1A02921 


JG3 


T. xiamenensis (99.4) 


East China Sea 


Upper seawater 


30 


1A02935 


JKl 


T. xiamenensis (99.2) 


East China Sea 


Upper seawater 


30 


1A03005 


L6 


T. xiamenensis (99.5) 


The Atlantic Ocean 


Sediment 


3,962 


1A03052 


AS-12-1 1 


T. xiamenensis (99.5) 


The Indian Ocean 


Sediment 


1,420 


1A03093 


AS-M6-1 1 


T. xiamenensis (99.5) 


The Atlantic Ocean 


Sediment 


2,987 


1A03514 


1-1 


T. tepidiptiila 


Kamaishi Bay, Japan 


Upper seawater 


15 



doi:10.1371/journal.pone.0106353.t001 



index of association {IS A), were investigated using the Linkage 
ANalysis (LIAN) version 3.6 (http://guanine.evolbio.mpg.de/cgi- 
bin/lian/lian.cgi.pl/ query) [35] . In addition, radiation times were 
estimated with each other based on the rate of synonymous 
substitution for single housekeeping genes and the concatenated 
housekeeping genes. These rates were normalized with the known 
radiation time between Escherichia coli and Salmonella enterica 
(ca. 1 million years) [36] . 

Phylogenetic analysis based on the 16S rRNA gene, 
individual housekeeping gene, the six concatenated 
housekeeping genes and DDH 

The phylogenetic analysis was performed using MEGA version 
5.0. The sequences of the 16S rRNA gene and six housekeeping 
genes were aligned using ClustalW option implemented in MEGA. 
A series of phylogenetic trees based on the 1 6S rRNA gene and 
single housekeeping gene were constructed respectively using the 
neighbour-joining algorithm by Kimura's 2-parameter model with 
the MEGA version 5.0. A phylogenetic tree based on the 
concatenated gene (in the following order: acsA, aroE, gyrB, mutL, 
rpoD and trpB) was also constructed. Rhodospirillum ruhrum 
ATCC 11170^ was used as the outgroup in all phylogenetic 
analysis. Bootstrap confidence analysis was carried out with 1000 
replications for evaluating the robustness of the tree topologies. 
Additionally, phylogenies from distance matrix by neighbour- 
joining method the clustering tree based on the DDH was 
constructed with the web server (http://genome.csdb.cn/cgi-bin/ 
emboss/ fneighbor). 

Correlation analysis between similarities of the IVILSA, 
DDH and ANI 

The genome sequences of sixteen representative Thalassospira 
strains selected based on the phylogenetic analysis were deter- 
mined by Shanghai Majorbio Bio-pharm Technology Co., Ltd. 
(Shanghai, China), using Solexa paired-end (500 bp library) 
sequencing technology. A total of 500 M bp clean data for each 
strain was generated to reach about 100-fold depth of coverage 
with an lUumina/Solexa Genome Analyzer IIx (lUumina, 
SanDiego, CA). The clean data was assembled by SOAPdenovo2 
[37]. Two of them {T. profundimaris WP0211^ and T. 
xiamenensis had been reported by our lab [38,39]. DDH 

values were estimated using the genome-to-genome distance 
calculator website service (GGDC 2.0) [40,41]. ANI values of 
these strains were calculated using the software JSpecies (VI. 2.1) 



[29]. Correlation analysis between similarities of the MLSA and 
DDH was simulated using the R version 3.0.1. 

Results 

Individual gene analyses 

The sequences of the 16S RNA gene and the six housekeeping 
genes of 58 strains were determined. The characteristics of gene(s) 
were listed in Table 2. 

The length of the 1 6S RNA gene was 1 ,444 bp, except for three 
strains named MCCC 1A00383, MCCC 1A00753 and MCCC 
1A00753 (1,449 bp). The number of alleles and polymorphic sites 
of the 16S RNA gene were 20 and 64, respectively; while the 
proportion of polymorphic sites and nucleotide diversity were 
4.42% (4.43%) and 0.012, respectively. The mean G-nC content 
was 55.1 mol%. The range of genetic distance of the 16S RNA 
gene was 0.000-0.027 (mean 0.008), corresponding to 97.3-100% 
identity. Measure substitution saturation of the 16S rRNA gene 
with DAMBE indicated that little saturation existed. The highly 
conserved 168 rRNA gene was unsuitable for distinguishing 58 
Thalassospira strains. Despite the low resolution of the 16S rRNA 
gene, 58 strains were stUl divided into 1 1 groups labeled with the 
group name A to O (Figure 1), the same group names used as in 
the MLSA phylogenetic tree. In brief, the largest Group A/B 
contained 25 strains with two type strains, T. xiamenensis M-5^ 
and T. permensis NBRC106175^ (non-valid). The second Group 
H/I/K consisted of 12 isolates including the type strains T. 
tepidiphila 1 - 1 B^. The Group J/L consisted of 3 isolates including 
the type strain T. profundimaris WPO0211^. Three type strains, 
T. lucentensis DSM 14000'^, T. alkalitolerans JCM 18968'^ and T. 
mesophila JCM 18969^, located on the root of the phylogenetic 
tree of the 16S rRNA gene and each represented a group (Group 
M, D and O). The group C was represented by the type strain T. 
xianhensis P-4^. The strains of group E formed a clade with 
recently described type strain T. povalilytica Zumi 95^. The rest of 
three groups (F, G and N) represented potential novel taxa based 
on the analyses below. However, low bootstrap values at many 
nodes of the 16S rRNA gene phylogenetic tree implied that 
topology structure of tree was unstable and the taxonomic 
afirdiation of all strains was inaccurate. Therefore, the 16S rRNA 
gene was inappropriate for the phylogenetic analysis of the genus 
Thalassospira. 

As the 1 6S rRNA gene was lack of discriminatory power for the 
closely related strains of the genus Thalassospira, six housekeeping 
genes {acsA, aroE, gyrB, mutL, rpoD and trpB) were utilized for 
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assessing the diversity. The length of six housekeeping genes was 
from 510 bp for the rpoD genes to 918 bp for the gyrB gene. The 
allele numbers of the five housekeeping genes was similar except 
the IrpB gene. The aroE gene exhibited the highest proportion of 
polymorphic sites (53.4%), followed by acsA (43.2%) and gyrB 
(40. 1 %). Only 1 79 polymorphic sites (35. 1 %) were identified in the 
7poD gene. It was worthwhile to note that the mean G+C content 
of six housekeeping gene had significantly different. Similarly, the 
nucleotide diversity (ti, the average number of nucleotide 
differences per site between two randomly-selected isolates) ranged 
from 0.119 {rpoD) to 0.192 {aroE) (Table 2), which showed the 
different evolution rates of six housekeeping genes in the MLSA 
scheme. Among the six housekeeping genes, the aroE gene 
exhibited the highest resolution based on the highest mean genetic 
distance (0.190), whereas rpoD displayed the lowest resolution with 
the lowest mean genetic distance (0.111). Six housekeeping genes 
showed low Ka./Ks ratios of 0.035-0.099, suggesting that they are 
under negative selection pressure. These results demonstrated that 
the six housekeeping genes were better than the 16S rRNA gene 
for differentiating Thalassospira bacteria; moreover, the aroE gene 
possessed the highest resolution power among all the tested 
housekeeping genes. 

In this study, 3 1 unique STs were identified on basis of the six 
concatenated genes, suggesting high genotypic diversity (Table 
S3). Of these, 16 STs (27.59%) corresponded to single strain and 
14 STs (60.34%) corresponded to two to four isolates. Whereas, 
the 28th ST (12.07%) was the best represented ST and found in 
seven different bacteria. Prior to the phylogenetic analysis, the 
examination of substitution saturation and recombination for six 
housekeeping genes were subjected using the above-mentioned 
software, respectively. The measures of the saturation test for 
single gene with DAMBE version 5.0 shown that the index oi Igg 
was smaller than the index of Iss.c ™ each gene, suggesting no 
signal of substitution saturation was not found (Table S4). At the 
same time, the analysis of split decomposition indicated some 
degree of reticulation (i.e. conflicting phylogenetic signals) for two 
genes, acsA and rpoD (Figure S2-S7). The PHI test also 
demonstrated the presence of recombination for the acsA gene 
(p<0.05) and the rpoD gene (p<0.05). Nevertheless, the parallel- 
ogram network of the two genes was not very obvious, suggesting 
that intragenic recombination was not remarkable. Besides, the 
intergenic recombination was estimated using the LIAN version 
3.6. The Ia and standardized index of association {IS A) were 5.32 
and 0.8634, respectively. Therefore, the nuU hypothesis of linkage 
equilibrium (IS ^ = 0) by both parametric and Monte Carlo 
methods (100 resamplings) was rejected, indicating linkage 
disequilibrium among the housekeeping genes. As a result, the 
MLSA analysis was reliable using these housekeeping genes. 

In comparison with the phylogenetic tree of the 16S rRNA 
gene, six phylogenetic trees based on single housekeeping gene 
sequences had high bootstrap values and presented roughly 
congruent topology (Figure S8-S13). In detail, the 58 strains were 
also separated into 15 groups named Group A to O in six 
phylogenetic trees. Some differences were observed in the topology 
at the deepest branching points of six phylogenetic trees. For 
example, the Group D was close to the Group M in the 
phylogenetic tree of the mutL gene; in contrast, it had close 
relationships to the Group A, B and C in other trees. Interestingly, 
the strains of Group A and B were mixed in the phylogenetic trees 
of the rpoD gene, this location conflict implies that the acquisition 
of the ropD gene of strain MCCC 1A1300 or MCCC 1A1330 was 
probably by the lateral gene transfer (LGT) event. Other slight 
differences were also observed in the six phylogenetic trees, as 
shown fuUy in the Figure S8-S13. 
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Figure 1. Phylogenetic tree based on the 16S rRNA gene of 59 r/>a/a£S05p//a bacteria. The tree was constructed using the neighbor-joining 
method with MEGA 5.0. Bootstrap values over 50% (1000 replications) were shown at each node. Bar, % estimated substitution. Rhodospirillum 
rubrum ATCC 1 1 1 70^ (NR_074249) was used as the outgroup. 
doi:1 0.1 371/journal.pone.01 06353.g001 
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The MLSA analysis 

The concatenated genes of six protein-coding genes contained 
4,413 bp, 31 alleles and 1,850 polymorphic sites with a mean G+C 
content of 56.0 mol%. The genetic distance of 58 tested strains 
ranged from 0.000 to 0.220 with a mean 0.138. 

The concatenated genes tree showed a similar topology to the 
individual gene tree including 15 Groups from A to O with very 
high bootstrap support (Figure 2). Specifically, the Group A was 
the largest and had five branches including 13 strains with two 
type strains T. xiamenensis M-5^ and T. permensis 
NBRC106175^ (non-valid). The second largest Group B was 
separated into three different branches with 12 strains. The Group 
E contained six strains with three branches including five genetic 
types. The Group H was split into three branches corresponding 
to three genetic types with eight strains. The six type strains, T. 
xianhensis P-4^, T. alkalilolerans JCM 1 8968''', T. profundimaris 
WP02ir^', T. Leptdiphila I-IB'*^, T. lucentensis DSM 14000^^', and 
T. mesophila JCM 18969^, formed respectively a separate group 
marked as Group C, D, J, K, M and O. The rest of five minority 
groups (Group F, G, I, L and N) and Group B, E and H, cannot be 
allocated to any previously described species. Therefore, they are 
new taxon, representing eight potential novel species. 

Correlation analysis between similarities of the MLSA and 
DDH, ANI, and the evaluation of radiation times 

In order to further confirm the accuracy of the MLSA analysis, 
the DDH values of the 1 6 representative strains of the 1 3 groups 
except the two newly described species (Group D and Group O) 
were calculated in silico from draft genomes (Table S5). 
Considering the 70% DDH values as a gold standard for the 
species boundary in prokaryotc taxonomy, 16 strains of the 
Thalassospira genus were classific-d into the 13 species txirre- 
sponding to the 1 3 group of th(; MLSA analysis. And then, the 
intraspecies and interspecies similarities of the individual gene and 
the concatenated genes among 58 strains were determined based 
on the species given by DDH values and shown in Figure 3. 
Disappointingly, identity values of the intraspecies and the 
interspecies partially overlapped in the most individual genes 
except the mulL gene. The overlap of identity values was 
encountered by the strains between Group B and C, and among 
the strains of Group J, K and L, which shared the high interspecies 
identity values, with the low intraspecies identity values existing in 
the strains within Group A, B or H. In contrast, identity values of 
the concatenated genes exhibited a distinct demarcation (96.16- 
97.12%) between intraspecies and interspecies. Consequently, 
compared with the single gene, the MLSA has a great advantage 
in distinguishing the Thalassospira bacteria. 

The correlation between the identity of the concatenated genes 
and DDH in 16 strains was determined by a nonlinear simulate 
analysis method. As can be seen in Figure SI 4, the identity values 
of the six concatenated housekeeping genes was highly correlated 
(R^ = 0.9661) with the DDH values. Based on the simulative 
logarithmic equation (y = 1 1.87*hrx+46.58), DDH value of 70% 
was equivalent to 97% identity of the MLSA, suggesting that 97% 
identity of the MLSA as interspecies gap can be used for the cut- 
off of the species for the Thalassospira bacteria. In addition, as 
shown in Table S6, the ANIm vahu's of intc'rsjx'cics ranged from 
83.34% to 95.76%, whereas the values of intraspecies were from 
97.20% to 97.93'M). These ANIm values were in accordance with 
standard ANI criteria for species identity (95-96%) [29]. 

The radiation times of each other in 58 strains of the genus 
Thalassospira were estimated using the rate of synonymous 
substitution for six housekeeping genes. On interspecies level, 
the radiation time between the strain MCCC 1A02803 (repre- 



senting the Group L) and the strain MCCC 1A03514 (represent- 
ing the Group K) was the lowest, about 22.21 million years. By 
contrast, these two strains had highly related genome, with 64.3% 
DDH values. This result indicates that the differentiation of two 
young species started at 22.21 miUion years ago. On intraspecies 
level, the highest radiation time between the strain MCCC 
1A00624 and MCCC 1A01300 from tiie same group was 19.30 
million years, the DDH value of the two strains was 80.7%. 
Certainly, these two strains would be divided into different species 
only several millions years ago. A common ancestor of the strains 
in the genus Thalassospira might occur 161.48 milUon years ago 
(Figure SI 5). 

Biogeography analysis 

In this study, the geographical locations of the 57 strains of the 
genus Thalassospira except of T. permensis NBRC106175^ 
covered a wide range of marine environments. According to the 
origin of these strains, four large areas, the coastal area (circle), the 
Pacific Ocean (triangle), the Indian Ocean (square) and the 
Atiantic Ocean (diamond), were marked using four symbols in 
MLSA phylogenetic tree, respectively. As shown in Figure 2, 
strains from the coastal area and the Indian Ocean clustered 
respectively together to some extent; for instance strains of Group 
B, L and N were from coastal area, Group E, F, G, H and I were 
from Indian Ocean; whereas the strains from the Pacific Ocean 
and the Atlantic Ocean scattered in several groups in the MLSA 
phylogenetic tree. 

In addition, to explore the affection of seawater depth, the 
habitats were artificially partitioned into the upper layer (0-100 m 
water depth) and the deeper layer (668^,480 m water depth) and 
marked in green and black, respectively, in the MLSA phyloge- 
netic tree (Figure 2). Intriguingly, the most strains tended to cluster 
together to a certain degree according to the habitats depth. In 
detail, tiie strains of the Group B (not including MCCC 1A01013 
and MCCC 1 AOlOl 7) and the Group C were all isolated from tiie 
upper layer (marked in Green) and gathered together, while the 
strains of the Group E, F, G, H, I and J came from the deeper 
layer (marked in Black) got together. These results indicate that the 
water depth is one of the key environmental factors for affecting 
the differentiation of the Thalassospira bacteria. 

Discussion 

Bacteria of the genus Thalassospira frerjuently occur in various 
marine environments,such as seawater, sediment, halobios from 
every oceans and seas, and frequentiy retrieved from the bacterial 
communiti(;s enriched using PAHs or crude oil. Recently, they 
were found to involve the phosphorus cycUng in marine 
environments [13,14]. The accumulations of isolates of this genus 
urge the systematic phylogeny analyses. In this report, the genetic 
diversity and geographic distribution of the Thalassospira bacteria 
were analyzed for the first time using many approaches including 
the MLSA with six protein-coding genes, DDH and ANI. 

In this study, we demonstrated that MLSA was a powerful 
method for the discrimination, classification and phylogenetic 
analysis of the Thalassospira bacteria, which cannot be distin- 
guished by the 16S rRNA gene. The similar situations were 
frequently encountered in other genera, such as Bacillus, Vibrio, 
Pseudomonas, Enlerohacter and Aeromonas etc. [20,24,25,42,43]. 
Recently, the single housekeeping gene (gyrB, rpoB, pyrE, aroE 
and so on) replaced the 16S rRNA gene and was widely used as a 
phylogentic marker for differentiating the closely related strains 
[21,24,43,44]. Due to the horizontal gene transfer [45] and genetic 
recombination [25], single housekeeping gene may distort the 
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Figure 2. Phylogenetic tree of the MLSA based on six concatenated housekeeping genes of the genus Thalassospira. The tree was 
constructed using the neighbor-joining method with MEGA 5.0. Bootstrap values over 50% (1000 replications) were shown at each node. Bar, % 
estimated substitution. Rhodospirillum rubrum ATCC 1 1 1 70^ (NC_007643) was used as the outgroup. Meantime, the isolated areas and water depth of 
the Thalassospira bacteria were divided artificially and marked by different symbols or colors. 
doi:10.1371/journal.pone.0106353.g002 



phylogenetic tree. In this report, for example, the strain MCCC 
1A02616 occurred in the horizontal gene transfer in the rpoD 
gene; while the MLSA based on several housekeeping genes are 
more reliable. In this study, the 58 strains of the genus 
Thalassospira were split into 15 groups using the MLSA; each of 
the seven groups (Group A, C, D, J, K, M and O) represented by a 
described type strain, and eight groups (Group B, E, F, G, H, I, L 
and N) represented a potential novel species awaiting for further 
classification of polyphasic taxonomy (Six strains of the Group E 
probably belong to recently described species T. povalilytica). In 
addition, DDH value of 70% was used as the gold standard for 
bacterial species delineation [46]. In this report, we chose 16 
representative strains from different groups for DDH and ANI 



calculation. As a result, the DDH values verified the results of 
MLSA analysis, and in addition defined the interspecies gap for 
the genus Thalassospira, with an identity range from 96.16% to 
97.12% based on six housekeeping genes. The interspecies gap of 
ANI value ranged from 95.76% to 97.20%. In the recent studies, 
the classification and the genetic diversity of the closely related 
strains in the intragenus were explored by the MLSA combined 
with the DDH [21,42,47,48]. Therefore, the MLSA is an effective, 
straightforward, reproducible and comparable tool to explore the 
taxonomy of the bacteria, to study the diversity of the strains from 
various environments, to understand evolution of species. 

The isolated origins of all culturable strains in our lab laboratory 
and the NCBI database indicate that they should be the obligate 



PLOS ONE I www.plosone.org 



8 



September 2014 | Volume 9 | Issue 9 | el 06353 



Multilocus Sequence Analysis of Thalassospira Bacteria 



16SrDNA acsA aroE gyrB mutL rpoD trpB MLSA 




Intraspecies {DDH>70%) 
o Interspecies {50%<DDH<70%) 
o Interspecies {DDH<50%) 



Gene(s) 

Figure 3. Intraspecies and interspecies identity of 16S rRNA gene and liousel<eeping gene(s) of tKie genus Thalassospira bacteria. The 

identity values of each other were displayed with three color circles (intraspecies: green, interspecies: blue/red). 
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marine bacteria though the non-vahdly published species, T. 
permensis SMB34''', was isolated from primitive technogeneous 
soil formed on salt-mine spoils at the Verchnekamsk deposit of 
potassium magnesium salts (Berezniki, Perm region, Russia). The 
geographic distribution patterns showed that the bacteria from the 
coastal area clustered together to some extent, as well as those 
from the Indian Ocean. However, the strains from the Pacific 
Ocean and the Atlantic Ocean scattered in the MLSA phyloge- 
netic tree (Fig. 3). Many similar studies have been reported in 
recent years. The bacteria of Bacillus puniilus group from marine 
and terrestrial environments clustered respectively based on the 
gyrB gene phylogenetic tree [24]. Likewise, Khan et al. discovered 
that the oceanic Pesudomonas aeruginosa strains trended distinct 
to cluster from those of other P. aeruginosa strains [44]. In 
addition, The Vibrio parahaemolyticus bacteria from different 
epidemiological sources in Thailand revealed a degree of 
clustering in phylogenetic analysis [49] . The inherent mechanism 
of the intriguing clustering in different origin isolates remains 
unclear, and then explores further in future investigations. 

Obviously in the tree (Fig. 3), two ecotypes were separated from 
each other according to the water depths. Thus, the surface 
ecotype and deep-sea ecotype naturally formed in the vast marine 
system. Our results were consistent with the previous reports. For 
example, the marine bacterium Alteromonas macleodii are found 
to be generally clustered with the depth in the water column from 
which the isolate originated [50]. Klochko et al found that 
peculiarities of A. macleodii strains reflected their deep/surface 
habitation rather than geographical distribution [51]. Similarly, 



Tarasov et al exposed significant differences between deep and 
shallow-water hydrothermal vent communities [52]. 

In summary, we reported the diversity and biogeography of 
Thalassospira bacteria from diverse marine environments for the 
first time, based on the MLSA, DDH and ANI. Fifteen distinct 
lineages corresponding to seven known and eight potential novel 
taxonomic groups were revealed, and the interspecies gap of 
MLSA for the genus Thalassospira is 96.16-97.12%. Interestingly, 
the Thalassospira bacteria exhibited surface and deep ecotypes 
according to the water depths. These results help to understand 
their adaptive evolution in marine environments. Further poly- 
phasic characterization and comparative genomic analysis are just 
under study to understand their role in marine environments. 

Supporting Information 

Figure SI The map of geographical distribution of the 
58 strains from various marine environments. Each red 
dot represents a strain, some dots overlapped. 
(DOCX) 

Figure S2 Split decomposition analysis of the acsA 
gene. 

(DOCX) 

Figure S3 Split decomposition analysis of the aroE 
gene. 

(DOCX) 

Figure S4 Split decomposition analysis of the gyrB gene. 

(DOCX) 
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gene. 
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Figure S6 Split decomposition analysis of the rpoD gene. 

(DOCX) 

Figure S7 Split decomposition analysis of the trpB gene. 

(DOCX) 

Figure S8 Phylogenetic tree based on the acsA gene. The 

tree was constructed using the neighbor-joining method with MEGA 
5.0. Bootstrap values over 50% (1000 replications) were shown at each 
node. Bar, % estimated substitution. The bacteria oi RhodospmUum 
rubrum ATCC 111 70"^ (NC_007643) was used as the outgroup. 
(DOCX) 

Figure S9 Phylogenetic tree based on the aroE gene. The 

tree was constructed using the neighbor-joining method with MEGA 
5.0. Bootstrap values over 50°/) (1000 replications) were shown at each 
node. Bar, % estimated substitution. The bacteria oi RhodospmUum 
rubrum ATCC 1 1 1 yo'^ (NC_007643) was used as the outgroup. 
(DOCX) 

Figure SIO Phylogenetic tree based on the gyrB gene. 

The tree wa.s constructed using the neighbor-joining method with 
MEGA 5.0. Bootstrap values over 50% (1000 replications) were shown 
at each node. Bootstrap values over 50% (1000 replications) were 
shown at each node. Bar, % estimated substitution. The bacteria of 
Rhodospirillum rubmm ATCC 1 1 170"^ (NC_007643) was used as the 
outgroup. 
(DOCX) 

Figure SIX Phylogenetic tree based on the mutL gene. 

The tree was constructed using the neighbor-joining method with 
MEGA 5.0. Bootstrap values over 50% (1000 replications) were shown 
at each node. Bootstrap values over 50'% (1000 replications) were 
shown at each node. Bar, % estimated substitution. The bacteria of 
Rhodospirillum rubrum ATCC 1 1 1 70^^ (NC_007643) was used as the 
outgroup. 
(DOCX) 

Figure S12 Phylogenetic tree based on the rpoD gene. 

The tree was constructed using the neighbor-joining method with 
MEGA 5.0. Bootstrap values over 50% (1000 replications) were shown 
at each node. Bar, % estimated substitution. The bacteria of 
Rhodospirillum rubrum ATCC 11170'^ (NC_007643) was used as 
the outgroup. 
(DOCX) 



Figure S13 Phylogenetic tree based on the trpB gene. 

The tree was constructed using the neighbor-joining method with 
MEGA 5.0. Bootstrap values over 50% (1000 replications) were 
shown at each node. Bar, % estimated substitution. The bacteria 
of Rhodospirillum rubrum ATCC 1 1 170'^ (NC_007643) was used 
as the outgroup. 
(DOCX) 

Figure S14 The correlation of the DDH and MLSA 
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Figure S15 The clustering tree of the DDH values and 
Radiation time of strains in the three clusters. 
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